巴西专利BR112015032013B1 METHOD AND EQUIPMENT FOR OBTAINING SPECTRUM COEFFICIENTS FOR AN AUDIO SIGNAL REPLACEMENT BOARD, AUDI

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
“METHOD AND EQUIPMENT FOR OBTAINING SPECTRUM COEFFICIENTS FOR AN AUDIO SIGNAL REPLACEMENT BOARD, AUDIO DECODER, AUDIO RECEIVER AND SYSTEM FOR AUDIO SIGNAL TRANSMISSION”. Approach is described to obtain spectrum coefficients for a replacement frame (m) of an audio signal. A tonal component of an audio signal spectrum is detected based on a peak that exists in the spectrum of frames prior to a replacement frame (m). For the tonal component of the spectrum, a spectrum coefficient for the peak (502) and its outer environment in the spectrum of the substitution frame (m) is predicted, and for the non-tonal spectrum component, a coefficient of the non- predicted for the substitution table (m) or a corresponding spectrum coefficient from a table prior to the substitution table (m) is used. Fig. 1
公开号:BR112015032013B1
申请号:R112015032013-9
申请日:2014-06-20
公开日:2021-02-23
发明作者:Christian Helmrich；Goran Markovic；Bernd Edler；Ralph Sperschneider；Janine Sukowski；Wolfgang Jaegers；Ralf Geiger
申请人:Fraunhofer-Gesellschaft zur Förderung der Angewandten ForschungE.V.；
IPC主号:

专利说明:

DESCRIPTION
[001] This invention concerns the area of transmission of encoded audio signals, more specifically in a method and equipment for obtaining spectrum coefficients for an audio signal replacement frame, for an audio decoder, for a receiver and a system for transmitting audio signals. The models concern an approach to building a spectrum for a replacement framework based on previously received frames.
[002] In the prior art, several approaches are described dealing with a loss of frames in an audio receiver. For example, when a frame is lost on the receiving side of an audio or speech codec, simple methods for hiding frame loss as described in reference [1] can be used, such as: repetition of the last received frame, drowning out the lost picture, or stopping signals.
[003] Additionally, in reference [1] an advanced technique using predictors in sub-bands is presented. The prediction technique is then combined with signal parasitization, and the prediction gain is used as a decision criterion for the subband information to determine which method should be used for the spectral coefficients of this subband.
[004] In reference [2], an extrapolation of the signal in wave form in the temporal domain is used for a codec domain MDCT (Modified Discrete Cosine Transformation). This type of approach can be good for monophonic signals including speech.
[005] If a frame delay is allowed, an interpolation of the frames from the outside environment can be used for the construction of the lost frame. Such an approach is described in reference [3], in which the magnitudes of the tonal components in the lost frame with an index m are interpolated using the surrounding frames indexed m-1 and m + 1. The side information that defines the MDCT coefficient signals for tonal components is transmitted in the bit stream. Signal parasitization is used for other non-tonal MDCT coefficients. The tonal components are determined as a predetermined fixed number of spectral coefficients with the highest magnitudes. This approach selects n spectral coefficients with the highest magnitudes as the tonal components.

[006] Fig. 7 illustrates a block diagram that represents an interpolation approach without transmitted side information as described for example in reference [4]. The interpolation approach operates based on audio frames encoded in the frequency domain using MDCT (Modified Discrete Cosine Transformation). A frame interpolation block 700 receives the MDCT coefficients of a frame before the lost frame and a frame after the lost frame, more specifically in the approach described in relation to Fig. 7, the MDCT coefficients cm_i (k) of the previous frame and the MDCT coefficients cm + 1 (k) of the following frames are received in the frame interpolation block 700. The frame interpolation block 700 generates an MDCT interpolated coefficient cm (k) for the current frame that has been lost in the receiver or cannot continue on the receiver for other reasons, for example due to errors in the received data or identical. The interpolated coefficient MDCT cm (k) at the output by the frame interpolation block 700 as applied to block 702 giving rise to a scale magnitude in the scale factor band and to block 704 giving rise to a scale magnitude an index establishment, and the respective blocks 702 and 704 of output of the MDCT coefficient cm (fc) scaled by the factor α (k) and ã (k), respectively. The output signal of block 702 is introduced into the pseudo-spectrum block 706 generated based on the input signal received from the pseudo-spectrum Pm (k) introduced in the peak detection block 708 with a signal indicating the detected peaks. The signal provided by block 702 is also applied to the random fault block of signal 712 which, receptive to the peak detection signal produced by block 708, causes a signal failure of the received signal and produces a modified MDCT cm (k) coefficient when spectrum composition block 710. The scaled signal provided by block 704 is applied to a signal correction block 714 giving, in response to the peak detection signal originated by block 708, a correction of the scaled signal signal provided by block 704 and producing a modified MDCT cm (k) coefficient to the spectrum composition block 710 which, based on the received signals, generates the interpolated MDCT coefficient cm (fc) produced by the spectrum composition block 710. As illustrated in Fig. 7, the peak detection signal for block 708 is also supplied to block 704 generating the scaled MDCT coefficient.
[007] Fig. 7 produces the spectral coefficients cm (X for the missing frame associated with the tonal components) at the exit of block 714, and at the exit of block 712 the spectral coefficients cm (k) for non-tonal components are provided so that in the spectrum composition block 710 based on the spectral coefficients received for the tonal and non-tonal components, the spectral coefficients for the spectrum associated with the missing frame are provided.
[008] The operation of the FLC technique (Frame Loss Hiding) described in the block diagram of Fig. 7 will now be described in more detail.
[009] In Fig. 7, basically four modules can be distinguished:
[010] a noise modulation insertion module (including the interpolation of frame 700, the scale magnitude in the scale factor band 702 and the random failure of signal 712),
[011] an MDCT bin classification module (including the pseudo spectrum 706 and the peak detection 708),
[012] a module of tonal concealment operations (including the scale magnitude at index 704 and correction of signal 714), and
[013] the composition of the 710 spectrum.
[014] The approach is based on the following general formula:

[015] Cm (k) is derived from a binary interpolation (see block 700 “Frame Interpolation”)

[016] α * (k) derived by an energy interpolation using the geometric mean:
[017] scale factor band for all components (see block 702 “Scale Magnitude in the Scale Factor Band”) and
[018] subset index for tonal components (see block 704 “Scale Magnitude in the Index”):

[019] ■ for tonal components, it can be illustrated that a = cos (πft), with fi is the frequency of the tonal component.
[020] E energies are derived based on a pseudo energy spectrum, derived by a simple regulation operation:

[021] s * (k) is set randomly at ± 1 for non-tonal components (see block 712 “Random Signal Failure”), and at +1 or -1 for tonal components (see block 714 “Signal Correction” ).
[022] Peak detection is performed as a search for local maximums in the pseudo energy spectrum to detect the exact positions of the spectral peaks corresponding to the underlying sinusoid. It is based on the sound identification process adopted in the MPEG-1 psycho-acoustic model described in the reference [5]. Of these, an index is defined with the bandwidth of a main lobe of an analysis window in terms of MDCT binaries and the peak detected in its center. These binaries are treated as MDCT binaries of dominant sounds of a sinusoid, and the subset of the index is treated as an individual tonal component.
[023] Signal correlation changes the signals for all binaries of a certain tonal component, or none. The determination is carried out using a synthesis analysis, that is, the SFM is derived for both versions and the version with the lower SFM is chosen. For the SFM derivation, the energy spectrum is necessary, which in turn requires the MDST (Modified Discrete Cosine Transformation) coefficients. To maintain viable complexity, only the MDST coefficients for the tonal component are derived, also using only the MDCT coefficients for this tonal component.
[024] Fig. 8 illustrates a block diagram of a global FLC technique that, when compared to the approach in Fig. 7, is refined and described in reference [6]. In Fig. 8, the MDCT coefficients cm_ ± and cm + 1 of a last frame prior to the missing frame and a first frame after the missing frame are received in an MDCT 800 binary rating block. These coefficients are also provided to the block insertion of noise modulation 802 and the MDCT estimate for a block of tonal components 804. In block 804, the output signal provided by classification block 800 is also received as the MDCT coefficients cm_2 and cm + 2 from the second to the last frame before the lost frame and the second frame before the lost frame, respectively, are received. Block 804 generates the MDCT coefficients cm of the missing frame for the tonal components, and the noise modulation insertion block 802 generates the MDCT spectral coefficients for the lost frame for non-tonal components. These coefficients are supplied to the spectrum composition block 806 generated at the output of the spectral coefficients c;, t for the lost frame. The noise modulation insertion block 802 operates in response to the iT system generated by the estimation block 804.
[025] The following modifications are of relevant interest in relation to the reference [4]:
[026] ■ The pseudo energy spectrum used for peak detection is derived as

[027] ■ To eliminate perceptually irrelevant or parasitic peaks, peak detection is only applied to a limited spectral range and only local maximums that exceed a threshold relative to the absolute maximum of the pseudo energy spectrum are considered. The remaining peaks are chosen in descending order of their magnitude, and a specified number of top maximums are classified as tonal peaks.
[028] ■ The approach is based on the following general formula (with the one attributed to it):

[029] ■ CmíkP) is derived as above, but the derivation of a becomes more advanced, following the approach

[030] replacing Em, Em_ ± and Em + 1 with

[031] where

[032] produces a quadratic expression in a. Thus, for a given MDCT estimate there are two candidates (with opposite signs) for the multiplicative correction factor (A1, A2, A3 are the transformation matrices). The selection of the best estimate is carried out in the same way as described in reference [4].
[033] ■ This advanced approach requires two frames before and after the frame loss in order to derive the MDST coefficients from the front and back frames.
[034] A less delayed version of this approach is suggested in reference [7]:
[035] ■ As a starting point, the interpolation formula c ^ ík) = (*) + cm + í (fc)) is reused, but is applied to frame m-1, resulting in:

[036] ■ Then, the result of the interpolation is replaced by the true estimate (here, factor 2 becomes part of the correction factor: a = 2cos (πft)). leading to

[037] ■ The correction factor is determined by observing energies from two previous frames. From the energy calculation, the MDST coefficients in the previous table are approximated as

[038] ■ Then, the sinusoidal energy is calculated as

[039] ■ Likewise, the sinusoidal energy for frame m-2 is calculated and denoted by Em_2 regardless of a.
[040] ■ Use of the energy requirement
Producing again a quadratic expression in a.
[041] ■ The selection process for the calculated candidates is carried out as before, but the decision rule considers only the energy spectrum of the previous table.
[042] Another concealment of loss of frame from less delay in the frequency domain is described in the reference [8]. The teachings of the reference [8] can be simplified, without losing generality, such as:
[043] ■ Prediction using a DFT of a time signal: (a) Obtain the DFT spectrum of the decoded time domain signal that corresponds to the coefficients of the coded frequency domain received in cm. (b) Modulate the DFT magnitudes, assuming a linear phase range, to predict the missing frequency domain coefficients in the next cm + í frame
[044] ■ Prediction using an estimated magnitude of the received frequency spectra: (a) Find ('/' „and Sí„, using Cm as an input, as

[045] where Qm (k) is the magnitude of the DFT coefficient that corresponds to Cm (k).
[046] (b) Calculate:

[047] (c) Perform a linear extrapolation of the magnitude and phase:

[048] ■ Use filters to calculate ('/' „and Sí„ de Cm and then proceed as above to obtain Cm + 1 (k)
[049] ■ Use an adaptive filter to calculate Cm + 1 (k):

[050] The selection of spectrum coefficients to be predicted is mentioned in reference [8] but is not described in detail.
[051] In reference [9] it was recognized that, for quasi-stationary signals, the phase difference between successive frames is almost constant and depends only on the fractional frequency. However, only a linear extrapolation of the last two complex spectra is used.
[052] In AMR + WB + (see reference [10]) a method described in reference [11] is used. The method in reference [11] is an extension of the method described in reference [8] in a sense that it also uses the spectral coefficients available from the current frame, assuming that only part of the current frame is lost. However, the situation of a total loss of a frame is not considered in the reference [11].
[053] Another concealment of the loss of the least delayed frame in the MDCT domain is described in reference [12]. In reference [12] it is first determined whether the lost frame Pth is a multiple harmonic frame. The lost Pth frame is a multiple harmonic frame if more than K0 frames between K frames before the Pth frame are provided with a flatness of the spectrum less than a threshold value. If the lost Pth frame is a multiple harmonic frame then the (P - K) th to (P - 2) nd frames in the MDST domain are used to predict the lost Pth frame. A spectral coefficient is a peak if its energy spectrum is greater than two adjacent energy spectrum coefficients. A pseudo-spectrum as described in reference [13] is used for table (P - 1) st.
[054] A set of Sc coefficients is constructed from the L1 energy spectrum frames as follows:
[055] Obtaining L1 determines S1, ..., SL1 composed of peaks in each of the Li frames, a number of peaks in each set being Ni, ..., NLI respectively. Selecting a set of L1 determines S1,., SL1. For each peak coefficient mj, j = i.Ni in the set S, considering whether there is any frequency coefficient between mj, mj ± i,., Mj ± k belonging to all sets of peaks. If one exists, placing all frequencies mj, mj ± i,., Mj ± k in the set of frequency Sc. If there is no frequency coefficient pertaining to all other sets of peaks, directly placing all frequency coefficients in one frame in the Sc frameset. Said k is a non-negative integer. For all spectral coefficients in the Sc set, the phase is predicted using the L2 frames among the MDCT-MDST (P - K) th to (P - 2) nd frames. The prediction is made using a linear extrapolation (when L2 = 2) or a linear adjustment (when L2> 2). For linear extrapolation:

[056] where p, t1 and t2 are frame indexes.
[057] Spectral coefficients that do not exist in the Sc set are obtained using a series of frames before the (P - 1) st frame, without specifically explaining how.
[058] It is the aim of this invention to provide an improved approach to obtain spectrum coefficients for an audio signal replacement frame.
[059] This objective is achieved through the method of claim 1, a product of the non-transitory computer program of claim 34, an equipment of claim 35 or claim 36, an audio encoder of claim 37, a receiver audio system of claim 38 and a system for transmitting audio signals of claim 39.
[060] This invention provides a method for obtaining spectrum coefficients for an audio signal replacement frame, the method comprising:
[061] the detection of a tonal component of a spectrum of an audio signal based on a peak in the spectra of the frames preceding a replacement frame;
[062] for the tonal component of the spectrum, predict spectrum coefficients for the peak and its surroundings in the spectrum of the substitution frame; and
[063] for the non-tonal component of the spectrum, using a non-predicted spectrum coefficient for the substitution table or a corresponding spectrum coefficient of a table prior to the substitution table.
[064] This invention provides equipment for obtaining spectrum coefficients for an audio signal replacement frame, the equipment comprising:
[065] a detector configured to detect a tonal component of an audio signal spectrum based on a peak in the frame spectra prior to a replacement frame; and
[066] a predictor configured to predict for the tonal component of the spectrum coefficients of the spectrum for the peak and its surroundings in the spectrum of the substitution frame;
[067] characterized in that the non-tonal component of the spectrum is a coefficient of the non-predicted spectrum for the substitution table or a corresponding spectrum coefficient of a table prior to the substitution table is used.
[068] This invention provides equipment for obtaining spectrum coefficients for an audio signal replacement frame, the equipment being configured to operate according to the innovative method to obtain spectrum coefficients for a replacement frame an audio signal.
[069] This invention provides an audio decoder, comprising innovative equipment for obtaining spectrum coefficients for an audio signal replacement board.
[070] This invention provides an audio receiver, comprising the innovative audio decoder.
[071] This invention provides a system for the transmission of audio signals, the system comprising:
[072] an encoder configured to generate an encoded audio signal; and
[073] the innovative decoder configured to receive the encoded audio signal, and to decode the encoded audio signal.
[074] This invention provides a non-transitory computer program product comprising a computer-readable medium that stores instructions that, when executed on a computer, execute the innovative method for obtaining coefficients for a substitution frame of a signal. audio.
[075] The innovative approach is advantageous because it provides a good concealment of loss of frames of tonal signals with a good quality and without the introduction of additional delay. The innovative low-delay codec is advantageous because it works well on both speech and audio signals and benefits, for example in an error-prone environment, from the good concealment of frame loss obtained especially for stationary tonal signals. A concealment of loss of low delay frames of monophonic and polyphonic signals is proposed, providing good results for tonal signals without degradation of non-tonal signals.
[076] According to the models of this invention, an improved hiding of tonal components in the MDCT domain is provided. The models relate to audio and speech coding that includes a codec in the frequency domain or a codec in the domain of speech / frequency exchange, in particular for hiding loss of frames in the domain MDCT (Modified Discrete Cosine Transformation). The invention, according to models, proposes a less delay method for the construction of an MDCT spectrum for a lost frame based on the previously received frames, in which the last received frame is encoded in the frequency domain using MDCT.
[077] According to preferred models, the innovative approach includes the detection of the tonal spectrum parts, for example using the second to the last complex of spectra to obtain the correct location or placement of the peak, using the last real spectrum to refine the decision if a binary is tonal, and using pitch information to better detect an insertion or pitch deviation, where pitch information already exists in the bit stream or is derived on the decoder side. In addition, the innovative approach includes an arrangement of an adaptive signal width of a harmonic to be concealed. The calculation of the phase deviation or phase difference between frames of each spectral coefficient that is part of a harmonic is also provided, and this calculation is based on the last available spectrum, for example the CMDCT spectrum, without the need for the second to the second. last CMDCT. According to models, the phase difference is refined using the last MDCT spectra received, and the refinement can be adaptable, depending on the number of frames lost consecutively. The CMDCT spectrum can be constructed from the decoded time domain signal which is advantageous as it avoids the need for any alignment with the codec structure, and allows the construction of the complex spectrum to be as close as possible to the lost frame through the exploration of the properties of low overlap windows. Models of the invention provide a frame decision to use a time domain or frequency domain hiding.
[078] The innovative approach is advantageous, as it operates entirely on the basis of the information already available on the receiver side when it determines that a frame has been lost or needs to be replaced and there is no need for additional side information that needs to be received so that it does not there is a source for additional delays that occur in prior art approaches given the need to receive additional side information or to derive additional side information from existing concrete information.
[079] The innovative approach is advantageous when compared to the prior art approaches described above as the subsequently described drawbacks of such approaches, recognized by the inventors of this invention, are avoided when the innovative approach is applied.
[080] The methods for hiding the loss of frames described in reference [1] are not sufficiently robust and do not produce sufficiently good results for tonal signals.
[081] The extrapolation of the waveform signal in the temporal domain, as described in reference [2], cannot contain polyphonic signals and requires an increased complexity to hide all tonal stationary signals, such as a slope delay exact should be determined.
[082] In reference [3] an additional delay is introduced and significant side information is required. The selection of the tonal component is very simple and will choose many peaks from among the non-tonal components.
[083] The method described in reference [4] requires an advance on the decoder side and therefore introduces an additional frame delay. Using the pseudo spectrum of smoothed energy for peak detection reduces the accuracy of peak location. It also reduces the reliability of detection as it will detect spikes in noise that appear in just one frame.
[084] The method described in reference [6] requires an advance on the decoder side and therefore introduces an additional delay of two frames. The selection of the tonal component does not correspond to tonal components in the two separate tables, but has a medium naked spectrum, and thus will be equipped with too many false positives or false negatives making it possible to adjust the peak detection thresholds. The location of the peaks will not be accurate as the pseudo energy spectrum is used. The limited spectral range for the search for peaks looks like an environment for the described problems that arise because the pseudo energy spectrum is used.
[085] The method described in reference [7] is based on the method described in reference [6] and therefore has the same drawbacks. It just overcomes the additional delay.
[086] In reference [8] there is no detailed description of the decision if a spectral coefficient belongs to the tonal part of the signal. However, the synergy between the detection of tonal spectral coefficients and concealment is important and thus a good detection of tonal components is important. In addition, the use of filters dependent on both cm and cm_1 (ie cm, cm_ ± and sm_ ±, as sm_ ± can be calculated when cm and cm_ ± is available) to calculate c'ru e has not been recognized. Also, the use of the possibility to calculate a complex spectrum that is not aligned with the structure of the coded signal, given with low overlapping windows, has not been recognized. In addition, the use of the possibility to calculate the phase difference between frames based only on the second last complex of the spectrum was not recognized.
[087] In reference [2], at least three previous frames should be stored in memory, thus significantly increasing the memory requirements. The decision whether to use tonal concealment may be wrong and a picture with one or more harmonics can be classified as a picture without multiple harmonics. The MDCT frame received last is not directly used to improve the prediction of the lost MDCT spectrum, but only in the search for tonal components. The number of MDCT coefficients to be hidden for a harmonic is fixed. However, depending on the noise level, it is desirable to have a variable number of MDCT coefficients that constitute a harmonic.
[088] In the following, models of this invention will be described in greater detail with reference to the drawings that accompany the drawings, which:
[089] Fig. 1 illustrates a simplified block diagram of a system for transmitting audio signals that implements the innovative approach on the decoder side,
[090] Fig. 2 illustrates a flowchart of the innovative approach according to a model,
[091] Fig. 3 is a schematic representation of the MDCT overlay windows for adjacent frames,
[092] Fig. 4 illustrates a flowchart that represents steps to catch a peak according to a model,
[093] Fig. 5 is a schematic representation of an energy spectrum of a frame from which one or more peaks are detected,
[094] Fig. 6 illustrates an example for an “between frames”,
[095] Fig. 7 illustrates a block diagram that represents an interpolation approach without transmitted side information, and
[096] Fig. 8 illustrates a block diagram of a refined total FLC technique when compared to Fig. 7.
[097] Next, the models of the innovative approach will be described in greater detail and it can be seen that in the drawings that accompany elements with the same or identical functionality, they are indicated by the same reference signs. In the following models of the innovative approach they will be described, according to which concealment is performed in the frequency domain, only if the last two frames received are encoded using MDCT. Details regarding the decision whether to use the hiding of the time domain or frequencies in a loss of frames after the resection of two MDCT frames will also be described. Regarding the models described below, it should be noted that the requirement that the last two frames are coded in the frequency domain does not reduce the applicability of the innovative approach as in a switched codec the frequency domain will be used for signals stationary tonalities.
[098] Fig. 1 illustrates a simplified block diagram of a system for transmitting audio signals that implements the innovative approach on the decoder side. The system comprises an encoder 100 that receives an input 102 from an audio signal 104. The encoder is configured to generate, based on the received audio signal 104, an encoded audio signal provided at an output 106 of encoder 100. The encoder can provide the encoded audio signal such as frames of the audio signal are encoded using MDCT. According to one model the encoder 100 comprises an antenna 108 to allow wireless transmission of the audio signal, as indicated in reference signal 110. In other models, the encoder can output the encoded audio signal provided at output 106 via of a wired connection, as indicated for example in reference signal 112.
[099] The system further comprises a decoder 120 provided with an input 122 in which the encoded audio signal provided by encoder 106 is received. Encoder 120 may, according to one model, comprise an antenna 124 for receiving a wireless transmission 110 from encoder 100. In another model, input 122 may provide a connection to wired transmission 112 for receiving the encoded audio signal. The audio signal received at input 122 of decoder 120 is applied to a detector 126 which determines whether a second frame of the received audio signal that is to be decoded by decoder 120 needs to be replaced. For example, according to models, this may be the case when detector 126 determines that a frame that is to follow an earlier frame is not received at the decoder or when it is determined that the received frame has errors that prevent decoding on the side decoder 120. If it is determined at detector 126 that a frame presented for decoding is available, the frame will be forwarded to decoding block 128 where a decoding of the encoded frame is performed so that at the output of decoder 130 a stream of audio frames decoded or a decoded audio signal 132 may be output.
[100] In the event that it is determined in block 126 that the frame currently processed needs replacement, the frames prior to the current frame that need a replacement and that can be buffered in detection circuits 126 are provided to a tonal detector 134 that determines whether the replacement spectrum includes tonal components or not. If no tonal components are provided, this is indicated for the noise generator / memory block 136 which generates spectral coefficients that are non-predictive coefficients and that can be generated using a noise generator or other conventional noise generation method, for example. example signal stopping or identical. Alternatively, pre-defined spectrum coefficients for tonal components of the spectrum can be obtained from a memory, for example a look-up table. Alternatively, when it is determined that the spectrum does not include tonal components, instead of generating non-predictive spectrum coefficients, corresponding spectral characteristics of one of the tables prior to substitution can be selected.
[101] In the event that the tonal detector 134 detects that the spectrum includes tonal components, a respective signal is indicated to the predictor 138, according to models of this invention later described, the spectrum coefficients for the substitution table. The respective coefficients determined for the replacement frame are provided to the decoding block 128 where, based on these spectrum coefficients, a decoding of the lost or replacement frame is performed.
[102] As shown in Fig. 1, the tonal detector 134, the noise generator 136 and the predictor 138 define equipment 140 for obtaining spectrum coefficients for a replacement frame in a decoder 120. The elements represented can be implemented using hardware and / or software components, for example properly programmed processing units.
[103] Fig. 2 illustrates a flowchart of the innovative approach according to a model. In the first step S200 an encoded audio signal is received, for example at the decoder 120 as shown in Fig. 1. The received audio signal can be in the form of the respective audio frames that are encoded using MDCT.
[104] In step S202 it is determined whether a current frame to be processed by decoder 120 needs to be replaced or not. A replacement frame may be required on the decoder side, for example if the frame cannot be processed due to an error in the received or identical data, or if the frame was lost during transmission to the receiver / decoder 120, or if the frame was not received in time at the audio signal receiver 120, for example due to a delay during transmission of the frame from the encoder side towards the decoder side.
[105] In case it is determined in step S202, for example by detector 126 in decoder 120, that the frame to be currently processed by decoder 120 needs to be replaced, the method proceeds to step S204 in which a later determination is made if a hiding of the frequency domain is necessary or not. According to a model, if the slope information is available for the last two frames and if the slope has not changed, it is determined in step S204 that a hiding of the frequency domain is desired. Otherwise, it is determined that a concealment of the time domain should be applied. In an alternative model, the slope can be calculated on a sub-frame basis using the decoded signal, and again using the decision that if the slope is present and if it is constant in the bus-frames, the hiding frequency domain is used, otherwise the time domain hiding is applied.
[106] In yet another model of this invention, a detector, for example detector 126 in decoder 120, can be provided and can be configured in such a way that it further analyzes the spectrum from the second to the last frame or the last frame or both of these previous frames the substitution table and to decide, based on the peaks found, whether the signal is monophonic or polyphonic. In case the signal is polyphonic, the hiding of the frequency domain should be used, regardless of the presence of inclination information. Alternatively, detector 126 in decoder 120 can be configured in such a way that it additionally analyzes the one or more frames prior to the replacement frame in order to indicate whether or not a number of tonal components in the signal exceed a predefined limit. In case the number of tonal components in the signal exceeds the limit, the hiding of the frequency domain will be used.
[107] If it is determined in step S204 that a concealment of the frequency domain should be used, for example by applying the criteria mentioned above, the method proceeds to step S206, where a tonal part or a tonal component of a spectrum of the audio signal is detected based on one or more peaks in the spectra of the previous frames, in particular one or more peaks present at significantly the same location in the spectrum from the second to the last frame and the spectrum of the last frame before the replacement frame. In step S208 it is determined whether there is a tonal part of the spectrum. In the case of a tonal part of the spectrum, the method proceeds to step S210, where one or more coefficients of the spectrum for the one or more peaks and their surroundings in the spectrum of the substitution frame are predicted, for example based on the derivable information of the previous tables, in particular the second to the last table and the last table. The coefficient (s) of the spectrum predicted in step S210 is (are) routed, for example to the decoder block 128 shown in Fig. 1, so that, as shown in step 212, the decoding of the encoded audio signal frame based on the spectrum coefficients of step 210 can be performed.
[108] If it is determined in step S208 that there is no tonal part of the spectrum, the method proceeds to step S214, using a spectrum coefficient not predicted for the substitution table or a corresponding spectrum coefficient from a table before the replacement frame, existing in step S212, for decoding the frame.
[109] If it is determined in step S204 that no hiding of the frequency domain is desirable, the method proceeds to step S216 where a hiding of the conventional time domain of the frame to be replaced is performed and based on the generated spectrum coefficient by the process in step S216 the encoded signal frame is decoded in step S212.
[110] If it is determined in step S202 that there is no substitution frame in the currently processed audio signal, that is, the currently processed frame can be fully decoded using conventional approaches, the method proceeds directly to step S212 to decode the encoded audio signal frame.
[111] Below is a description of additional details according to models of this invention.
[112] Calculation of the energy spectrum
[113] For the second last table, indexed m - 2, the MDST Sm-2 coefficients are calculated directly from the decoded time domain signal.
[114] For the last table an estimated MDST spectrum is used, calculated from the MDCT Cm-1 coefficients of the last received table (see eg reference [13])

[115] The energy spectra for frames m-2 at -1 are calculated as follows:

[116] with: Sm-1 (k) MDST coefficient in table m-1, Cm-1 (k) MDCT coefficient in table m-1, Sm-2 (k) MDST coefficient in table m-2, and Cm- 2 (k) MDCT coefficient in table m-2.
[117] The energy spectra obtained are smoothed as follows:

[118] Detection of tonal components
[119] The peaks in the last two tables (m-2 and m-1) are considered to represent tonal components. The continuous existence of the peaks allows a distinction between tonal components and random peaks in the noisy signals.
[120] Slope information
[121] It is assumed that the slope information is available: • calculated on the encoding side and available on the bit stream, or • calculated on the decoding side.
[122] The slope information is used only if all of the following conditions are met: • the slope gain is greater than zero, • the slope delay is constant over the last two frames • the fundamental frequency is greater than 100 Hz
[123] The fundamental frequency is calculated from the slope delay:

[124] If there is F0 = n- F0 for which the harmonics N> 5 are the strongest in the spectrum, then F0 is defined in F0 '. F0 is not reliable if there are not enough peaks in the n - F0 harmonic positions.
[125] According to a model, the slope information is calculated by aligning the structure of the right boundary of the MDCT window illustrated in Fig. 3. This alignment is beneficial for the extrapolation of the tonal parts of a signal as the area of overlap 300, being the part that needs concealment, is also used to calculate the slope delay.
[126] In another model, the slope information can be transferred in the bit stream and used by the codec in the free channel and thus has no costs for concealment.
[127] Surroundings
[128] Below is a description of a procedure for obtaining a spectrum envelope, necessary to capture peaks later described.
[129] The envelope of each energy spectrum in the last two frames is calculated using an integrated motion filter of length L:

[130] The length of the filter depends on the fundamental frequency (and can be limited to the range [7.23]):

[131] This connection between L and F0 is identical to the procedure described in reference [14], however, in this invention the current frame's slope information is used including an anticipation, where reference [14] uses a specific average slope at a speaker. If the fundamental frequency is not available or is not reliable, the length of filter L is 15.
[132] Select a peak
[133] Peaks are primarily sought in the energy spectrum of frame m - 1 based on predefined limits. Based on the location of the peaks in the m - 1 frame, the limits for searching the energy spectrum of the m - 2 frame are adapted. Thus the peaks in both frames (m - 1 in - 2) are found, but the exact location is based on the energy spectrum in frame m - 2. This instruction is important because the energy spectrum in frame m - 1 is calculated only in an MDST estimate and thus the location of a peak is not accurate. It is also important that the MDCT of the m - 1 chart is used, as it is undesirable to continue with tones that exist only in the m - 2 chart and not in the m - 1 chart. Fig. 4 illustrates a flowchart that represents the previous steps for catch a peak according to a model. In step S400, the peaks are searched for in the energy spectrum of the last frame m - 1 prior to the substitution frame based on one or more predefined limits. In step S402, the one or more thresholds are adapted. In step S404, peaks are sought in the energy spectrum of the second last frame m - 2 prior to the replacement frame based on one or more adapted limits.
[134] Fig. 5 is a schematic representation of a frame's energy spectrum from which one or more peaks are detected. In Fig. 5, the envelope 500 is illustrated and can be determined as previously described or can be determined by other known approaches. A number of illustrated candidate peaks are represented by the circles in Fig. 5. Finding, among the candidate peaks, a peak, will be described in more detail below. Fig. 5 illustrates a peak 502 found as well as a false peak 504 and a peak 506 representing noise. In addition, a left foot 508 and a right foot 510 of a spectrum coefficient are illustrated.
[135] According to a model, finding peaks in the Pm-1 energy spectrum of the last frame m - 1 prior to the replacement frame is done using the following steps (step S400 in Fig. 4): • a spectral coefficient is classified as a candidate tonal peak if all of the following criteria are met:
[136] the ratio between the smoothed energy spectrum and the 500 envelope is greater than a certain limit:

[137] the ratio between the smoothed energy spectrum and the surrounding 500 is greater than their adjoining ones, meaning that it is a local maximum, • the local maximum is determined by finding the left foot 508 and the right foot 510 of a coefficient of the ke spectrum by finding a maximum between the left foot 508 and the right foot 510. This step is necessary as can be seen in Fig. 4, in which the false peak 504 can be caused by a lateral lobe or by quantification noise .
[138] The limits for the peak search in the Pm-2 energy spectrum of the second last frame m - 2 are determined as follows (step S402 in Fig. 4): • in the coefficients of the ke spectrum [i -1, i +1] around a peak in index i at Pm-1
[139] Threshold (k) = (Psmoothedm-1 (k)> Envelopem-1 (k)) 9.21dB: 10.56dB if F0 is available and is reliable then for each ne [1, N] determines k = | _n ■ F0J and frac = n • F0 -k:
[140] Threshold, k) = 8.8 dB + 10 ^ log10 (0.35)
[141] Threshold k -1) = 8.8 dB + 10 ^ log10 (0.35 + 2 • weak)
[142] Threshold k + 1) = 8.8 dB + 10 ^ log10 (0.35+ 2 • (1- frac)),
[143] if k and [i -1, i + 1] around a peak in index i at Pm-1 then the limits determined in the first step are rewritten, • for all other indices:
[144] Threshold (k) = 20.8 dB
[145] Tonal peaks are found in the Pm-2 energy spectrum of the second last frame m - 2 by the following steps (step S404 in Fig. 4): • a spectral coefficient is classified as a tonal peak if:
[146] the ratio of the energy spectrum and the surroundings is greater than the limit:

[147] the ratio of the energy spectrum and the upper envelope to its adjacent ones, meaning that it is a local maximum, • local maximums are determined by finding the left foot 508 and the right foot 510 of a spectral coefficient k for finding a maximum between left foot 508 and right foot 510, • left foot 508 and right foot 510 also define the surroundings of a tonal peak 502, that is, the spectral binaries of the tonal component in which the tonal concealment method will be used .
[148] Using the previously described method, it reveals that the right peak 506 in Fig. 4 only exists in one of the frames, that is, it does not exist in both frames m - 1 or m - 2. Therefore, this peak is marked as noise and is not selected as a tonal component.
[149] Extraction of sinusoidal parameters
[150] For a sinusoidal signal
a deviation to NN / 2 (the dimension of the MDCT hop) results in the signal
[151]

[152] Thus, there is the phase shift Δ ^ = π ■ (l + Δ l), where l is the index of a peak. Therefore, the phase shift depends on the fractional part of the input frequency plus an additional addition of π for odd spectral coefficients.
[153] The fractional part of the frequency Δl can be derived using a method described, eg in reference [15]: since the magnitude of the subband signal k = l is a local maximum. Δl can be determined by calculating the ratio of the signal magnitudes in the sub-bands k = l - 1 and k = l + 1, that is, by evaluation:
[154]

[155] in which the approximation of the response of the magnitude of a window is used:
[156]

[157] where b is the width of the main lobe. The constant G in this expression has been adjusted to 27.4 / 20.0 in order to minimize the maximum absolute error of the estimate, replacing the approximate frequency response and

[158] b ′ = 2 b
[159] leads to:

[160] MDCT prediction
[161] For all peaks of the spectrum found and their adjoining ones, the MDCT prediction is used. For all other coefficients in the spectrum, signal parasitization or an identical noise generation method can be used.
[162] All spectrum coefficients belonging to the peaks found and their adjoining ones belong to the set designated as K. For example, in Fig. 5 peak 502 was identified as a peak representing a tonal component. The adjacent of peak 502 can be represented by a predefined number of adjacent spectral coefficients, for example by the spectral coefficients between those on the left side 508 and those on the right side 510, plus the coefficients on the sides 508, 510.
[163] According to models, the peak envelope is defined by a predefined number of coefficients around peak 502. The peak envelope can comprise a first number of coefficients to the left of peak 502 and a second number of coefficients to the right of peak 502. The first number of coefficients to the left of peak 502 and the second number of coefficients to the right of peak 502 can be the same or different.
[164] According to models that apply the EVS standard, the predefined number of surrounding coefficients can be determined or fixed in a first step, eg before detecting the tonal component. In the EVS standard three coefficients to the left of peak 502, three coefficients to the right and peak 502 can be used, that is, all seven coefficients together (this number was chosen for complex reasons, however, any other number will also obtain results).
[165] According to models, the size of the peak envelope is adaptable. The envelope of the peaks identified as representing a tonal component can be modified so that the envelopes around two peaks do not overlap. According to models, a peak is always considered only with its surroundings and together they define a tonal component.
[166] For the prediction of the MDCT coefficients in a lost frame, the energy spectrum (the magnitude of the complex spectrum) is used in the second last frame:

[167] The MDCT coefficient lost in the substitution table is estimated as:

[168] The following describes a method for calculating the ym (k) phase according to a model.
[169] Phase prediction
[170] For all peaks in the spectrum found, the fractional frequency Δ l is calculated as described above and the phase shift is:

[171] Δy is the phase shift between frames. It is the same for the coefficients on a peak and its surroundings.
[172] The phase for each spectrum coefficient at the peak position and the surroundings (ke K) is calculated in the second last frame received using the expression:

[173] The phase in the lost frame is predicted as:
[174]

[175] According to a model, a refined phase shift can be used. Using the calculated phase Φm-2 (k) for each coefficient of the spectrum at the peak position and the surroundings allows an estimate of the MDST in table m - 1 that can be derived as:

[176] with:
[177] Qm-2 (k) energy spectrum (magnitude of the complex spectrum) in table m-2.
[178] From this MDST estimate and the MDCT received a phase estimate in table m - 1 is derived:

[179] The estimated phase is used to refine the phase shift:

[180] with:
[181] Φm-1 (k) - phase of the complex spectrum in frame m-1, and
[182] Φm-2 (k) - phase of the complex spectrum in frame m-2.
[183] The phase in the lost frame is predicted as:

[184] Refining the phase shift according to this model improves the prediction of the sinusoid in the presence of background noise or if the sinusoid frequency changes. For non-overlapping sinusoids with constant frequency and without background noise, the phase shift is the same for all MDCT coefficients that surround the peak.
[185] The concealment used can be provided with different gradual speeds for the tonal part and for the noise part. If the gradual speed for the tonal part of the signal is slower, after multiple frames lost, the tonal part becomes dominant. Fluctuations in the sinusoid, due to the different phase deviations of the sinusoidal components, produce unpleasant disturbances.
[186] In order to overcome this problem, according to models, starting from the third lost frame, the peak phase difference (with k index) is used for all the spectral coefficients that surround it (k - l is the right foot index ek + u is the right foot index):
(a) According to additional models, a transition is planned. The spectral coefficients in the second lost frame with high attenuation use the peak phase difference, and the coefficients with small attenuation use the corrected phase difference:

[187] Refinement of magnitude
[188] According to other models, instead of applying the described phase shift refinement, another approach can be applied using a magnitude refinement:

[189] where l is the index of a peak, the fractional frequency Δl is calculated as described above. The phase shift is:

[190] To avoid an increase in energy, the refined magnitude, according to additional models, can be limited by the magnitude of the second last frame:

[191] Also, according to additional models, the decrease in magnitude can be used to slow it down:

[192] Phase prediction using “between frames”
[193] Instead of basing the prediction of spectral coefficients in the tables prior to the substitution table, according to other models, the phase prediction can use an “between frames” (also referred to as an “intermediate” frame). Fig. 6 illustrates an example for an “between frames”. In Fig. 6 the last frame 600 (m - 1) before the replacement frame, the second last frame 602 (m - 2) before the replacement frame, and the between frames 604 (m - 1.5) are illustrated together with the respective MDCT 606 to 610 windows.
[194] If the overlap of the MDCT window is less than 50% it is possible to obtain the CMDCT spectrum closest to the lost frame. In Fig. 6 an example with an overlap of the MDCT window of 24% is obtained. This allows obtaining the CMDCT spectrum for between frames 604 (m - 1.5) using the dashed window 610, just like the MDCT window 606 or 608 but with the deviation for half the frame length of the codec structure. Since the interframe 604 (m - 1.5) is closer to the lost frame (m), its spectrum characteristics will be more similar to the spectrum characteristics of the lost frame (m) than the characteristics of the spectrum between the second last frame (m - 2) and the lost frame (m).
[195] In this model, the calculation of both MDST Sm-1.5 coefficients and MDCT Cm-1.5 coefficients is done directly from the decoded time domain signal, with MDST and MDCT constituting the CMDCT. Alternatively, the CMDCT can be derived using matrix operations from the existing surrounding MDCT coefficients.
[196] The calculation of the energy spectrum is done as described above, and the detection of tonal components is done as described above with the m-2nd frame replaced by the m-1.5th frame.
[197] For a sinusoidal signal
a deviation to N / 4 (dimension of the MDCT hop) results in the signal

[198] This results in the phase shift Δ ^ 5 = - • (l + Δ l). Therefore, the phase shift depends on the fractional part of the input frequency plus additional addition of (lmod4) -, where l is the index of a peak. Detection of the fractional frequency is done as described above.
[199] For the prediction of MDCT coefficients in a lost frame, the magnitude of the m-1.5 frame is used:

[200] The lost MDCT coefficient is estimated as:
[201]

[202] The phase can be calculated using:

[203] Also, according to models, the refinement of the phase shift described above can be applied:

[204] In addition, the phase shift convergence for all spectral coefficients surrounding a peak to the peak phase shift can be used as described above.
[205] Although some aspects of the described concept have been described in the context of an equipment, it is clear that these aspects also represent a description of the corresponding method, in which a block or device corresponds to a step of the method or a characteristic of a step of the method. Likewise, aspects described in the context of a method step also represent a description of a corresponding block or item or characteristic of a corresponding equipment.
[206] Depending on certain implementation requirements, models of the invention can be implemented in hardware or software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, Blue-Ray, CD, ROM, PROM, EPROM, EEPROM or a FLASH memory equipped with control signals read electronically on them, cooperating (or that able to cooperate) with a programmable computer system so that the respective method is executed. In this way, the digital storage medium can be read by computer.
[207] Some models according to the invention comprise a data carrier equipped with electronically readable control signals, capable of cooperating with a programmable computer system, so that one of the methods described here is performed.
[208] As a general rule, models of this invention can be implemented as a computer program product with a program code, the program code being operative to execute one of the methods when the computer program product runs on a computer. The program code can for example be stored in an automatic medium.
[209] Other models include the computer program to execute one of the methods described here, stored in an automatic medium.
[210] In other words, a model of the innovative method is, therefore, a computer program equipped with a program code for executing one of the methods described here, when the computer program works on a computer.
[211] An additional model of the innovative methods is, therefore, a data carrier (or a digital storage medium, or a medium read by computer) comprising, in them, the computer program for the execution of one of the methods here described.
[212] An additional model of the innovative method is, therefore, a data stream or a sequence of signals that represent the computer program to execute one of the methods described here. The data stream or signal sequence can for example be configured to be transferred via a data communication link, for example via the Internet.
[213] An additional model comprises a processing medium, for example a computer, or a programmable logic device, configured or adapted to execute one of the methods described here.
[214] An additional model comprises a computer with the computer program installed on it to perform one of the methods described here.
[215] On some models, a programmable logic device (for example, a network of programmable logic gates) can be used to perform some or all of the functionality of the methods described here. In some models, a network of programmable logic gates can cooperate with a microprocessor in order to execute one of the methods described here. As a general rule, the methods are preferably executed by any hardware equipment.
[216] The models described above are merely illustrative for the principles of this invention. It should be understood that modifications and variations of the provisions and the details described here will be evident to other specialists in the field. It is therefore intended to be limited only by the scope of the pending patent claims and not by the specific details presented here as a description and clarification of the models.
[217] References to the Prior Art
[218] [1] P. Lauber and R. Sperschneider, "Error Concealment for Compressed Digital Audio," in AES 111th Convention, New York, USA, 2001.
[219] [2] C. J. Hwey, "Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment". US Patent 6,351,730 B2, 2002.
[220] [3] S. K. Gupta, E. Choy and S.-U. Ryu, "Encoder-assisted frame loss concealment techniques for audio coding". US Patent 2007/094009 A1.
[221] [4] S.-U. Ryu and K. Rose, "A Frame Loss Concealment Technique for MPEG-AAC," in 120th AES Convention, Paris, France, 2006.
[222] [5] ISO / IEC JTC1 / SC29 / WG11, Information technology - Coding of moving pictures and associated, International Organization for Standardization, 1993.
[223] [6] S.-U. Ryu and R. Kenneth, An MDCT domain frame-loss concealment technique for MPEG Advanced Audio Coding, Department od Electrical and Computer Engineering, University of California, 2007.
[224] [7] S.-U. Ryu, Source Modeling Approaches to Enhanced Decoding in Lossy Audio Compression and Communication, UNIVERSITY of CALIFORNIA Santa Barbara, 2006.
[225] [8] M. Yannick, "Method and apparatus for transmission error concealment of frequency transform coded digital audio signals". Patent EP 0574288 B1, 1993.
[226] [9] Y. Mahieux, J.-P. Petit and A. Charbonnier, "Transform coding of audio signals using correlation between successive transform blocks," in Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989.
[227] [10] 3GPP; Technical Specification Group Services and System Aspects, Extended Adaptive Multi-Rate - Wideband (AMR-WB +) codec, 2009.
[228] [11] A. Taleb, "Partial Spectral Loss Concealment in Transform Codecs". US Patent 7,356,748 B2.
[229] [12] C. Guoming, D. Zheng, H. Yuan, J. Li, J. Lu, K. Liu, K. Peng, L. Zhibin, M. Wu and Q. Xiaojun, "Compensator and Compensation Method for Audio Frame Loss in Modified Discrete Cosine Transform Domain ". US Patent 2012/109659 A1.
[230] [13] L. S. M. Dauder, "MDCT Analysis of Sinusoids: Exact Results and Applications to Coding Artifacts Reduction," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, pp. 302-312, 2004.
[231] [14] D. B. Paul, "The Spectral Envelope Estimation Vocoder," IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 786-794, 1981.
[232] [15] A. Ferreira, "Accurate estimation in the ODFT domain of the frequency, phase and magnitude of stationary sinusoids," 2001 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 47-50, 2001.

权利要求:
Claims (37)
[0001]
1. Method for obtaining spectrum coefficients for an audio signal substitution frame, the method comprising: the detection (S206) of a tonal component of a spectrum of an audio signal based on a peak (502) that it exists in the frame spectra (m-1, m-2) prior to a substitution frame (m); for the tonal component of the spectrum, prediction (S210) of spectrum coefficients for the peak (502) and their surroundings in the spectrum of the substitution table (m); and for the non-tonal component of the spectrum, use (S214) of a coefficient of the spectrum not predicted for the substitution table (m) or a corresponding spectrum coefficient of a table prior to the substitution table (m), characterized by the coefficients of the spectrum for the peak (502) and its surroundings in the spectrum of the substitution frame (m) are predicted based on magnitudes of the complex spectrum of the penultimate frame (m-2) prior to the substitution frame (m) and the predicted phase of the complex spectrum of the replacement frame (m), the phase of the complex spectrum of the replacement frame (m) is predicted based on the complex spectrum phase of the last frame (m-1) prior to the replacement frame (m) and a deviation phase between the last frame (m-1) and the penultimate frame (m-2) before the replacement frame (m), and the complex spectrum phase of the last frame (m-1) before the replacement frame (m ) be determined based on the magnitude of the complex spectrum of the penultimate imo frame (m-2) before the replacement frame (m), in the complex spectrum phase of the penultimate frame (m-2) before the replacement frame (m), in the phase shift between the last frame (m-1 ) and the penultimate frame (m-2) before the substitution frame (m) and the real spectrum of the last frame (m-1).
[0002]
Method according to claim 1, characterized in that the tonal component is defined by the peak and its surroundings.
[0003]
Method according to either of claims 1 or 2, characterized in that the peak envelope is defined by a predefined number of coefficients around the peak (502).
[0004]
Method according to any one of claims 1 to 3, characterized in that the peak envelope comprises a first number of coefficients to the left of the peak (502) and a second number of coefficients to the right of the peak (502).
[0005]
Method according to claim 4, characterized in that the first number of coefficients comprises coefficients between a left foot (508) and the left foot (502) plus the left foot coefficient (508), and the second number of coefficients understand coefficients between a right foot (510) and the peak (502) plus the coefficient of the right foot (510).
[0006]
Method according to claim 4 or 5, characterized in that the first number of coefficients at the left foot of the peak (502) and the second number of coefficients at the right of the peak (502) are the same or different.
[0007]
Method according to claim 6, characterized in that the first number of coefficients to the left of the peak (502) is three and the second number of coefficients to the right of the peak (502) is three.
[0008]
Method according to any one of claims 3 to 7, characterized in that the predefined number of coefficients around the peak (502) is determined before the step of detecting the tonal component.
[0009]
Method according to any one of claims 1 to 8, characterized in that the size of the peak envelope is adaptable.
[0010]
Method according to claim 9, characterized in that the peak envelope is selected so that the envelopes around the two peaks do not overlap.
[0011]
Method according to claim 1, characterized in that the phase shift between the last frame (m-1) and the penultimate frame (m-2) prior to the replacement frame (m) is a refined phase shift, and the refined phase shift is determined based on the complex spectrum phase of the last frame (m-1) prior to the replacement frame (m) and the complex spectrum phase of the penultimate frame (m-2) prior to the replacement frame ( m).
[0012]
Method according to claim 11, characterized in that the refinement of the phase shift is adaptable based on the number of consecutively lost frames.
[0013]
Method according to claim 12, characterized in that beginning with a third lost frame, a phase shift determined for a peak is used to predict the spectral coefficients surrounding the peak (502).
[0014]
Method according to claim 13, characterized in that for the prediction of the spectral coefficients in a second lost frame, a phase shift determined for the peak (502) is used to predict the spectral coefficients for the surrounding spectral coefficients when the deviation phase in the last frame (m-1) before the substitution frame (m) is equal to or less than a predetermined limit, and a determined phase shift for the respective surrounding spectral coefficients is used to predict the spectral coefficients of the coefficients surrounding spectra when the phase shift in the last frame (m-1) prior to the replacement frame (m) is greater than the predefined limit.
[0015]
15. Method for acquiring spectrum coefficients for an audio signal substitution frame (m), the method comprising: the detection (S206) of a tonal component of a peak-based audio signal spectrum (502) that exists in the frame spectra (m-1, m-2) prior to a replacement frame (m); for the tonal component of the spectrum, prediction (S210) of spectrum coefficients for the peak (502) and their surroundings in the spectrum of the substitution table (m); and for the non-tonal component of the spectrum, use (S214) of a coefficient of the spectrum not predicted for the substitution table (m) or a corresponding spectrum coefficient of a table prior to the substitution table (m), characterized by the coefficients of the spectrum for the peak (502) and its surroundings in the spectrum of the substitution frame (m) to be predicted based on magnitudes of the complex spectrum of the last frame (m-1) prior to the substitution frame (m) and the predicted phase of the complex spectrum of the replacement frame (m), and the phase of the complex spectrum of the replacement frame (m) is predicted based on the phase of the complex spectrum of the penultimate frame (m-2) prior to the replacement frame (m) and in the double the phase deviation between the last frame (m-1) and the penultimate frame (m-2) before the replacement frame (m).
[0016]
16. Method according to claim 15, characterized in that the magnitudes of the complex spectrum of the last frame (m-1) prior to the substitution frame (m) are refined magnitudes, and the refined magnitudes are determined based on a spectrum coefficient of the real spectrum of the last frame (m-1) before the replacement frame (m), in the complex spectrum phase of the penultimate frame (m-2) before the replacement frame (m) and in the phase shift between the last frame (m-1) and the penultimate frame (m-2) before the replacement frame (m).
[0017]
17. Method according to claim 15 or 16, characterized in that the magnitudes of the complex spectrum of the last frame (m-1) prior to the substitution frame (m) are refined magnitudes, and the refined magnitudes are limited by the magnitude of the complex spectrum from the penultimate table (m-2) before the substitution table (m).
[0018]
Method according to any one of claims 1 to 17, characterized in that the detection of a tonal component of the audio signal spectrum comprises: the search (S400) for peaks in the spectrum of the last frame (m-1) prior to the frame substitution (m) based on one or more predefined limits; the adaptation (S402) of one or more limits; and the demand (S404) for peaks in the spectrum of the penultimate table (m-2) prior to the substitution table (m) based on one or more adapted limits.
[0019]
19. Method according to claim 18, characterized in that the adaptation of the one or more limits comprises the determination of the one or more limits to look for a peak in the penultimate frame (m-2) prior to the substitution frame (m) in a region around a peak found in the last frame (m-1) prior to the replacement frame (m) based on the spectrum and in a spectrum envelope from the last frame (m-1) prior to the replacement frame (m), or based on a fundamental frequency calculated from a slope delay between frames of the audio signal.
[0020]
20. Method according to claim 19, characterized in that the fundamental frequency is for the signal including the last frame (m-1) before the replacement frame (m) and the anticipation of the last frame (m-1) before the frame replacement (m).
[0021]
21. Method according to claim 20, characterized in that the anticipation of the last frame (m-1) prior to the substitution frame (m) is calculated on the coding side using the anticipation.
[0022]
22. Method according to any one of claims 18 to 21, characterized in that the adaptation (S402) of the one or more limits comprises the determination of one or more limits for the search for a peak in the penultimate frame (m-2) prior to the replacement frame (m) in a region not surrounding a peak found in the last frame (m-1) prior to the replacement frame (m) for a pre-defined limit value.
[0023]
23. Method according to any one of claims 1 to 22, characterized in that it comprises: the determination (S204), for the substitution table (m), whether to conceal in the time domain or a concealment in the frequency domain using the prediction of spectral coefficients for tonal components of the audio signal.
[0024]
24. Method according to claim 23, characterized in that the hiding in the frequency domain is applied in case the last frame (m-1) before the substitution frame (m) and the penultimate frame (m-2) before substitution frame (m) are provided with a constant pitch, or an analysis of one or more frames prior to the substitution frame (m) indicates that a number of tonal components in the signal exceeds a pre-defined limit.
[0025]
25. Method according to any one of claims 1 to 24, characterized in that the frames of the audio signal are encoded using MDCT.
[0026]
26. Method according to any one of claims 1 to 25, characterized in that a replacement frame (m) comprises a frame that cannot be processed in an audio signal receiver, due to an error in the received data, or a frame that was lost during transmission to the audio signal receiver, or a frame that was not received in time at the audio signal receiver.
[0027]
27. Method according to any one of claims 1 to 26, characterized in that a non-predicted spectrum coefficient is generated using a noise generating method, the noise generating method including signal interference, or using a predefined spectrum coefficient memory, the memory including a lookup table.
[0028]
28. Equipment for obtaining spectrum coefficients for an audio signal substitution frame (m), the equipment comprising: a detector (134) configured to detect a tonal component of a spectrum of an audio signal based at a peak that exists in the spectra of frames prior to a replacement frame (m); and a predictor (138) configured to predict for the tonal component of the spectrum the spectrum coefficients for the peak (502) and their surroundings in the spectrum of the substitution frame (m); characterized by using for the non-tonal component of the spectrum, a coefficient of the spectrum not predicted for the substitution table (m) or a corresponding coefficient of the corresponding spectrum of a table prior to the substitution table (m), the spectrum coefficients for the peak (502) and their surroundings in the spectrum of the substitution frame (m) are predicted based on magnitudes of the complex spectrum of the penultimate frame (m-2) prior to the substitution frame (m) and the predicted phase of the complex spectrum of the substitution frame (m) substitution (m), the complex spectrum phase of the substitution frame (m) is predicted based on the complex spectrum phase of the last frame (m-1) prior to the substitution frame (m) and a phase shift between the last frame (m-1) and the penultimate frame (m-2) prior to the replacement frame (m), and the complex spectrum phase of the last frame (m-1) prior to the replacement frame (m) be determined on the basis of in the magnitude of the complex spectrum of the penultimate frame (m- 2) before the substitution frame (m), in the complex spectrum phase of the penultimate frame (m-2) before the substitution frame (m), in the phase shift between the last frame (m-1) and the penultimate frame (m-2) before the substitution table (m) and the real spectrum of the last table (m-1).
[0029]
29. Equipment for obtaining spectrum coefficients for an audio signal substitution frame (m), characterized in that the equipment is configured to operate according to the method as described in any of claims 1 to 27.
[0030]
30. Audio decoder, characterized in that it comprises equipment as described in claim 28 or 29.
[0031]
31. Audio receiver, characterized in that it comprises an audio decoder as described in claim 30.
[0032]
32. System for the transmission of audio signals, characterized in that the system comprises: an encoder (100) configured to generate the encoded audio signal; and a decoder (120) as described in claim 30 configured to receive the encoded audio signal, and to decode the encoded audio signal.
[0033]
33. Apparatus for acquiring spectrum coefficients for an audio signal substitution frame (m), the apparatus comprising: a detector (134) configured to detect a tonal component of a peak-based audio signal spectrum ( 502) that exists in the spectra of frames prior to a replacement frame (m); and a predictor (138) configured to predict for the tonal component of the spectrum the spectrum coefficients for the peak (502) and their surroundings in the spectrum of the substitution frame (m); characterized by using for the non-tonal component of the spectrum, a coefficient of the spectrum not predicted for the substitution table (m) or a corresponding coefficient of the spectrum corresponding to a table prior to the substitution table (m), the spectrum coefficient for the peak (502) and its surroundings in the spectrum of the substitution frame (m) be predicted based on magnitudes of the complex spectrum of the last frame (m-1) prior to the substitution frame (m) and the predicted phase of the complex spectrum of the substitution (m), and the phase of the complex spectrum of the replacement frame (m) is predicted based on the phase of the complex spectrum of the penultimate frame (m-2) prior to the replacement frame (m) and twice the phase shift between the last frame (m-1) and the penultimate frame (m-2) before the substitution frame (m).
[0034]
34. Apparatus for acquiring spectrum coefficients for an audio signal substitution frame (m), characterized in that it is configured to operate according to the method of claim 19.
[0035]
35. Audio decoder, characterized in that it comprises an apparatus according to claim 33.
[0036]
36. Audio decoder, characterized in that it comprises an audio decoder (120) according to claim 34.
[0037]
37. System for the transmission of audio signals, characterized by comprising: an encoder (100) configured to generate the encoded audio signal; and a decoder (120), as described in claim 35, configured to receive the encoded audio signal, and to decode the encoded audio signal.

类似技术:

公开号 | 公开日 | 专利标题

BR112015032013B1|2021-02-23|METHOD AND EQUIPMENT FOR OBTAINING SPECTRUM COEFFICIENTS FOR AN AUDIO SIGNAL REPLACEMENT BOARD, AUDIO DECODER, AUDIO RECEIVER AND SYSTEM FOR TRANSMISSING AUDIO SIGNALS

US20200066288A1|2020-02-27|Audio decoder and method for providing a decoded audio information using an error concealment modifying a time domain excitation signal

US10269359B2|2019-04-23|Audio decoder and method for providing a decoded audio information using an error concealment based on a time domain excitation signal

KR20090082415A|2009-07-30|Synthesis of lost blocks of a digital audio signal, with pitch period correction

BR112013020699B1|2021-08-17|APPARATUS AND METHOD FOR ENCODING AND DECODING AN AUDIO SIGNAL USING AN EARLY ALIGNED PART

BR112014021054A2|2021-05-25|phase coherence control for harmonic signals in perceptual audio codecs

BRPI0813178B1|2020-05-12|ENCODING AUDIO SIGNAL ENCODING PROCESS, SCALABLE DECODING PROCESS OF AN AUDIO SIGNAL, AUDIO SIGNAL ENCODER, AND AUDIO SIGNAL ENCODER

US11282529B2|2022-03-22|Method and apparatus for obtaining spectrum coefficients for a replacement frame of an audio signal, audio decoder, audio receiver, and system for transmitting audio signals

同族专利:

公开号 | 公开日

CA2915437C|2017-11-28|

MX352099B|2017-11-08|

AU2014283180A1|2016-02-11|

HK1224075A1|2017-08-11|

EP3011556A1|2016-04-27|

SG11201510513WA|2016-01-28|

JP6248190B2|2017-12-13|

KR20160024918A|2016-03-07|

TWI562135B|2016-12-11|

WO2014202770A1|2014-12-24|

US20180108361A1|2018-04-19|

JP2016526703A|2016-09-05|

EP3011556B1|2017-05-03|

US10475455B2|2019-11-12|

US20160104490A1|2016-04-14|

TW201506908A|2015-02-16|

PT3011556T|2017-07-13|

CN105408956B|2020-03-27|

RU2016101336A|2017-07-26|

ES2633968T3|2017-09-26|

AU2014283180B2|2017-01-05|

BR112015032013A2|2017-07-25|

CN105408956A|2016-03-16|

MX2015017369A|2016-04-06|

RU2632585C2|2017-10-06|

CN111627451A|2020-09-04|

CA2915437A1|2014-12-24|

US20200020343A1|2020-01-16|

PL3011556T3|2017-10-31|

KR101757338B1|2017-07-26|

US9916834B2|2018-03-13|

MY169132A|2019-02-18|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

FR2130952A5|1971-03-26|1972-11-10|Thomson Csf|

US4771465A|1986-09-11|1988-09-13|American Telephone And Telegraph Company, At&T Bell Laboratories|Digital speech sinusoidal vocoder with transmission of only subset of harmonics|

FR2692091B1|1992-06-03|1995-04-14|France Telecom|Method and device for concealing transmission errors of audio-digital signals coded by frequency transform.|

JP3328532B2|1997-01-22|2002-09-24|シャープ株式会社|Digital data encoding method|

WO1999050828A1|1998-03-30|1999-10-07|Voxware, Inc.|Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment|

US6496797B1|1999-04-01|2002-12-17|Lg Electronics Inc.|Apparatus and method of speech coding and decoding using multiple frames|

WO2000060576A1|1999-04-05|2000-10-12|Hughes Electronics Corporation|Spectral phase modeling of the prototype waveform components for a frequency domain interpolative speech codec system|

US6636829B1|1999-09-22|2003-10-21|Mindspeed Technologies, Inc.|Speech communication system and method for handling lost frames|

SE0004818D0|2000-12-22|2000-12-22|Coding Technologies Sweden Ab|Enhancing source coding systems by adaptive transposition|

US7447639B2|2001-01-24|2008-11-04|Nokia Corporation|System and method for error concealment in digital audio transmission|

US6879955B2|2001-06-29|2005-04-12|Microsoft Corporation|Signal modification based on continuous time warping for low bit rate CELP coding|

CA2388439A1|2002-05-31|2003-11-30|Voiceage Corporation|A method and device for efficient frame erasure concealment in linear predictive based speech codecs|

US7356748B2|2003-12-19|2008-04-08|Telefonaktiebolaget Lm Ericsson |Partial spectral loss concealment in transform codecs|

EP1722359B1|2004-03-05|2011-09-07|Panasonic Corporation|Error conceal device and error conceal method|

US8620644B2|2005-10-26|2013-12-31|Qualcomm Incorporated|Encoder-assisted frame loss concealment techniques for audio coding|

US8255207B2|2005-12-28|2012-08-28|Voiceage Corporation|Method and device for efficient frame erasure concealment in speech codecs|

KR100770839B1|2006-04-04|2007-10-26|삼성전자주식회사|Method and apparatus for estimating harmonic information, spectrum information and degree of voicing information of audio signal|

EP2054879B1|2006-08-15|2010-01-20|Broadcom Corporation|Re-phasing of decoder states after packet loss|

KR100788706B1|2006-11-28|2007-12-26|삼성전자주식회사|Method for encoding and decoding of broadband voice signal|

KR101291193B1|2006-11-30|2013-07-31|삼성전자주식회사|The Method For Frame Error Concealment|

US8935158B2|2006-12-13|2015-01-13|Samsung Electronics Co., Ltd.|Apparatus and method for comparing frames using spectral information of audio signal|

ES2533358T3|2007-06-22|2015-04-09|Voiceage Corporation|Procedure and device to estimate the tone of a sound signal|

US7885819B2|2007-06-29|2011-02-08|Microsoft Corporation|Bitstream syntax for multi-process audio decoding|

US8489396B2|2007-07-25|2013-07-16|Qnx Software Systems Limited|Noise reduction with integrated tonal noise reduction|

US8428957B2|2007-08-24|2013-04-23|Qualcomm Incorporated|Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands|

PL2346029T3|2008-07-11|2013-11-29|Fraunhofer Ges Forschung|Audio encoder, method for encoding an audio signal and corresponding computer program|

KR101706009B1|2008-07-11|2017-02-22|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.|Audio encoder, audio decoder, method for encoding and decoding an audio signal. audio stream and computer program|

WO2010028292A1|2008-09-06|2010-03-11|Huawei Technologies Co., Ltd.|Adaptive frequency prediction|

CN101958119B|2009-07-16|2012-02-29|中兴通讯股份有限公司|Audio-frequency drop-frame compensator and compensation method for modified discrete cosine transform domain|

MX2012004116A|2009-10-08|2012-05-22|Fraunhofer Ges Forschung|Multi-mode audio signal decoder, multi-mode audio signal encoder, methods and computer program using a linear-prediction-coding based noise shaping.|

KR101672025B1|2012-01-20|2016-11-02|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝에. 베.|Apparatus and method for audio encoding and decoding employing sinusoidal substitution|

JP6139685B2|2012-09-13|2017-05-31|エルジーエレクトロニクスインコーポレイティド|Lost frame restoration method, audio decoding method, and apparatus using the same|

US9401153B2|2012-10-15|2016-07-26|Digimarc Corporation|Multi-mode audio recognition and auxiliary data encoding and decoding|

WO2014123469A1|2013-02-05|2014-08-14|Telefonaktiebolaget L M Ericsson |Enhanced audio frame loss concealment|

DK2956932T3|2013-02-13|2016-12-19|ERICSSON TELEFON AB L M |Hide the framework of errors|CN112967727A|2014-12-09|2021-06-15|杜比国际公司|MDCT domain error concealment|

WO2016142002A1|2015-03-09|2016-09-15|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Audio encoder, audio decoder, method for encoding an audio signal and method for decoding an encoded audio signal|

US10504525B2|2015-10-10|2019-12-10|Dolby Laboratories Licensing Corporation|Adaptive forward error correction redundant payload generation|

JP6611042B2|2015-12-02|2019-11-27|パナソニックＩｐマネジメント株式会社|Audio signal decoding apparatus and audio signal decoding method|

EP3246923A1|2016-05-20|2017-11-22|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for processing a multichannel audio signal|

CN106101925B|2016-06-27|2020-02-21|联想有限公司|Control method and electronic equipment|

US20180075855A1|2016-09-09|2018-03-15|Dts, Inc.|System and method for long term prediction in audio codecs|

RU2652434C2|2016-10-03|2018-04-26|Виктор Петрович Шилов|Method of transceiving discrete information signals|

CN106533394B|2016-11-11|2019-01-04|江西师范大学|A kind of high-precision frequency estimating methods based on sef-adapting filter amplitude-frequency response|

EP3454336B1|2017-09-12|2020-11-04|Dolby Laboratories Licensing Corporation|Packet loss concealment for critically-sampled filter bank-based codecs using multi-sinusoidal detection|

JP6907859B2|2017-09-25|2021-07-21|富士通株式会社|Speech processing program, speech processing method and speech processor|

US10186247B1|2018-03-13|2019-01-22|The Nielsen Company , Llc|Methods and apparatus to extract a pitch-independent timbre attribute from a media signal|

CN113129910A|2019-12-31|2021-07-16|华为技术有限公司|Coding and decoding method and coding and decoding device for audio signal|

法律状态:
2018-11-06| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2020-06-02| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-12-08| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-02-23| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 20/06/2014, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

EP13173161|2013-06-21|

EP13173161.4|2013-06-21|

EP14167072.9|2014-05-05|

EP14167072|2014-05-05|

PCT/EP2014/063058|WO2014202770A1|2013-06-21|2014-06-20|Method and apparatus for obtaining spectrum coefficients for a replacement frame of an audio signal, audio decoder, audio receiver and system for transmitting audio signals|

[返回顶部]